Temporal Index Sharding for Space-time Efficiency in Archive Search

机译：档案搜索中时空效率的时态索引分割

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Time-travel queries that couple temporal constraints with keyword queries are useful in searching large-scale archives of time-evolving content such as the Web, document collections, wikis, and so on. Typical approaches for efficient evaluation of these queries involve \emph{slicing} along the time-axis either the entire collection~\cite{253349}, or individual index lists~\cite{kberberi:sigir2007}. Both these methods are not satisfactory since they sacrifice compactness of index for processing efficiency making them either too big or, otherwise, too slow. We present a novel index organization scheme that \emph{shards} the index with \emph{zero increase in index size}, still minimizing the cost of reading index index entries during query processing. Based on the optimal sharding thus obtained, we develop practically efficient sharding that takes into account the different costs of random and sequential accesses. Our algorithm merges shards from the optimal solution carefully to allow for few extra sequential accesses while gaining significantly by reducing the random accesses. Finally, we empirically establish the effectiveness of our novel sharding scheme via detailed experiments over the edit history of the English version of Wikipedia between 2001-2005 ($\approx$ 700 GB) and an archive of the UK governmental web sites ($\approx$ 400 GB). Our results demonstrate the feasibility of faster time-travel query processing with no space overhead.

机译：将时间约束与关键字查询结合在一起的时间旅行查询在搜索时间推移内容的大规模档案（例如Web，文档集合，Wiki等）时非常有用。有效评估这些查询的典型方法包括沿时间轴\ emph {slicing}整个集合〜\ cite {253349}或单个索引列表〜\ cite {kberberi：sigir2007}。这两种方法都不令人满意，因为它们牺牲了指数的紧凑性来提高处理效率，从而使其太大或太慢。我们提出了一种新颖的索引组织方案，该方案将\ emph {shards}的索引\ emph {索引大小的零增加}，在查询处理期间仍将读取索引索引条目的成本降至最低。基于由此获得的最佳分片，我们开发了一种考虑到随机访问和顺序访问的不同成本的实用有效分片。我们的算法会仔细合并最佳解决方案中的分片，以减少额外的顺序访问，同时通过减少随机访问来显着增加收益。最后，我们通过对2001年至2005年英语版本的Wikipedia（$ \大约700 GB）的编辑历史以及英国政府网站的存档（$ \ approx进行详细的实验），通过经验确定新的分片方案的有效性。 $ 400 GB）。我们的结果证明了在没有空间开销的情况下进行更快的时间旅行查询处理的可行性。

著录项

作者
Anand, A.; Bedathur, S.; Berberich, K.; Schenkel, R.;
展开▼
作者单位

展开▼
年度 2011
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. On the efficiency of chemotactic pursuit - Comparing blind search with temporal and spatial gradient sensing [J] . Claus Metzner Scientific reports. . 2019,第1期

机译：论趋化性追求效率 - 与时空梯度传感的盲目搜索
2. Exploiting Geographical and Temporal Locality to Boost Search Efficiency in Peer-to-Peer Systems [J] . Hailong Cai, Jun Wang IEEE Transactions on Parallel and Distributed Systems . 2006,第期

机译：利用地理和时间局部性来提高对等系统中的搜索效率
3. Resource-Efficient Index Shard Replication in Large Scale Search Engines [J] . Li Yusen, Tang Xueyan, Cai Wentong, IEEE Transactions on Parallel and Distributed Systems . 2019,第12期

机译：大型搜索引擎中的资源高效索引分片复制
4. Temporal Index Sharding for Space-Time Efficiency in Archive Search [C] . Avishek Anand, Srikanta Bedathur, Klaus Berbericlv, International ACM SIGIR conference on research and development in information retrieval . 2011

机译：时空索引分片在档案搜索中提高时空效率
5. Shards into Shards Redeemed [D] . ?Xavier, Caeden 2020

机译：碎片进入碎片救赎
6. On the efficiency of chemotactic pursuit - Comparing blind search with temporal and spatial gradient sensing [O] . Claus Metzner -1

机译：关于趋化性追求的效率-盲搜索与时空梯度感测的比较
7. Temporal Index Sharding for Space-Time Efficiency in Archive Search [O] . Avishek Anand, Srikanta Bedathur, Klaus Berberich, 2010

机译：时空索引分片以提高档案搜索中的时空效率

Temporal Index Sharding for Space-time Efficiency in Archive Search

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅